Morphological Degradation Models and their Use in Document Image Restoration Qigong Zheng and Tapas Kanungo Morphological Degradation Models and their Use in Document Image Restoration

نویسندگان

  • Tapas Kanungo
  • Qigong Zheng
چکیده

Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated degradation model is used to estimate the probability of an ideal binary pattern, given the noisy observed pattern. This probability is estimated by degrading noise-free document images and then computing the frequency of corresponding noise-free and noisy pattern pairs. This conditional probability is then used to construct a lookup table to restore the noisy images. The impact of the restoration process is then quanti ed by computing the decrease in OCR word and character error rate. We nd that given the estimated degradation model parameter values, the restoration algorithm decreases the character error rate by 16.1% and the word error rate by 7.35%. In some categories of degradation (e.g. model parameters that give rise to broken characters) there is a 41.5% reduction in character error rate and a 20.4% reduction in word error rate. This research was funded in part by the Science Applications International Corporation under Contract 4400019848, the Defense Advanced Research Projects Agency under Contract N660010028910, and the National Science Foundation under Grant IIS9987944. LAMP-TR-065 CAR-TR-962 CS-TR-4218 4400019848 N660010028910/IIS9987944 February 2001 Morphological Degradation Models and their Use in Document Image Restoration Qigong Zheng and Tapas Kanungo Morphological Degradation Models and their Use in Document Image Restoration Qigong Zheng and Tapas Kanungo Language and Media Processing Laboratory Center for Automation Research University of Maryland College Park, MD 20742-3275 fqzheng,[email protected] Abstract Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated degradation model is used to estimate the probability of an ideal binary pattern, given the noisy observed pattern. This probability is estimated by degrading noise-free document images and then computing the frequency of corresponding noise-free and noisy pattern pairs. This conditional probability is then used to construct a lookup table to restore the noisy images. The impact of the restoration process is then quanti ed by computing the decrease in OCR word and character error rate. We nd that given the estimated degradation model parameter values, the restoration algorithm decreases the character error rate by 16.1% and the word error rate by 7.35%. In some categories of degradation (e.g. model parameters that give rise to broken characters) there is a 41.5% reduction in character error rate and a 20.4% reduction in word error rate.Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated degradation model is used to estimate the probability of an ideal binary pattern, given the noisy observed pattern. This probability is estimated by degrading noise-free document images and then computing the frequency of corresponding noise-free and noisy pattern pairs. This conditional probability is then used to construct a lookup table to restore the noisy images. The impact of the restoration process is then quanti ed by computing the decrease in OCR word and character error rate. We nd that given the estimated degradation model parameter values, the restoration algorithm decreases the character error rate by 16.1% and the word error rate by 7.35%. In some categories of degradation (e.g. model parameters that give rise to broken characters) there is a 41.5% reduction in character error rate and a 20.4% reduction in word error rate. This research was funded in part by the Science Applications International Corporation under Contract 4400019848, the Defense Advanced Research Projects Agency under Contract N660010028910, and the National Science Foundation under Grant IIS9987944.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphological degradation models and their use in document image restoration

Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a modelbased restoration algorithm. The restoration algorithm first estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated d...

متن کامل

A Downhill Simplex Algorithm for Estimating Morphological Degradation Model Parameters Tapas Kanungo and Qigong Zheng A Downhill Simplex Algorithm for Estimating Morphological Degradation Model Parameters

Noise models are crucial for designing image restoration algorithms, generating synthetic training data, and predicting algorithm performance. However, to accomplish any of these tasks, an estimate of the degradation model parameters is essential. In this paper we describe a parameter estimation algorithm for a morphological, binary image degradation model. The inputs to the estimation algorith...

متن کامل

Lamp - Tr - 065 Car - Tr - 962 Cs - Tr - 4218 4400019848

Document images undergo various degradation processes. Numerous models of these degradation processes have been proposed in the literature. In this paper we propose a model-based restoration algorithm. The restoration algorithm rst estimates the parameters of a degradation model and then uses the estimated parameters to construct a lookup table for restoring the degraded image. The estimated de...

متن کامل

Power functions and their use in selecting distance functions for document degradation model validation

Two document degradation models that model the perturbations introduced during the document printing and scanning process were proposed recently. Although degradation models are very useful , it is very important that we validate these models by comparing the synthetically generated images against real images. In recent past, two diierent validation procedures have also been proposed to validat...

متن کامل

A Statistical, Nonparametric Methodology for Document Degradation Model Validation

ÐPrinting, photocopying, and scanning processes degrade the image quality of a document. Statistical models of these degradation processes are crucial for document image understanding research. Models allow us to predict system performance, conduct controlled experiments to study the breakdown points of the systems, create large multilingual data sets with groundtruth for training classifiers, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001